Compendium Deluxe 1

home *** CD-ROM | disk | FTP | other *** search

/ Compendium Deluxe 1 / LSD Compendium Deluxe 1.iso / a / programming / assemblers / cas.lha / cas.doc < prev next >

Wrap

Text File | 1992-12-03 | 24.6 KB | 620 lines

CAS -- The 8051 C-Assembler (0) Introduction (a) Features This is a free full-featured one-pass 8051 assembler, it could very well be the first one-pass assembler for the popular MCS-51 family of microprocessors. What you get are the following features: * Seperately assembleable files. There are two stages of assembly: - Pass 1: Creation of object files - Pass 1 1/3: Linking of object files * Segmentation - RELATIVE ADDRESSING supported for all segment types * Conditional assembly, with a C-like syntax. Example: if (Condition) { Assembly instructions... } else { Assembly instructions... } * Multiple statements per line with C-like syntax. * C-like expression syntax. * Command-line options similar to those of *NIX C compilers. * An extensive archive of real-life assembly language programs, including a multi-tasking library and an 8051 disassembler. Plus, if you don't want to learn all the elaborate ins and outs of this tool right away, it is just as easy to use the first time out as any minimal assembler. You simply will not find anything this extensive anywhere in the public domain. But it's yours, here, for free. Also under works: a compatible 8051 simulator kit for software developers. What makes this kit unique is that you can (and usually must) link in your own C code to define any arbitrary 8051 environment at all. This gives you the flexibility to simulate the 8051 in your favorite embedded application and to even simulate the I/O on a desktop. A Standard Environment file is included with the package. (b) Design Philosophy ... everything is done in one pass. A clean distinction is made between the two phases of assembly: (a) creating segments and formatting image files, (b) mapping segments and resolving references to variable addresses. An assembly language program will normally consist of a set of assembly language modules (or source files). Each will typically be named with the suffix ".s". In addition, there will also be a set of files, with names ending in ".h" whose purpose is to provide common points of reference for declarations of objects in or related to modules. They are incorporated in *.s files using the "include" directive. The first stage of assembly will create OBJECT files, whose names end in ".o": one for each assembly language module. For instance, a module named Kernel.s will be assembled to the object file Kernel.o. The second stage will take all the object files that have been created and LINK them together. This process will consist mainly of completing the definitions of variables defined in one module and used in another, and in mapping the memory segments defined in each module onto a memory image. These two stages correspond roughly to the first and second pass of a traditional two-pass assembler. But there are now two major differences: (a) the second stage can now be deferred. It is possible to assemble object files only, and defer the linking phase. Furthermore, it is possible to use the SAME object file in more than one project. (b) the second stage is now considerably shortened compared to the second pass of a traditional two-pass assembler because object files tend to be much smaller than source files and because the assembler no longer has to process the assembly language itself by the second stage. (1) Command line arguments The cas assembler's command line basically follows that of a typical C compiler. In the examples: (a) cas -c kernel.s (b) cas -c math.s data.s stdio.s kernel.s (c) cas math.s data.s stdio.s kernel.s (d) cas -o data.hex math.o data.s stdio.o kernel.s (a) will assemble the file kernel.s, creating kernel.o. (b) will assemble all the files listed, creating .o files in the process. If a .o file is listed with the -c option, it is ignored. (c) will assemble all the files listed, as in (b), and then link all the corresponding .o files. The output file will take the same base name as the first file listed, and will have the suffix .hex. Therefore, the output in this example will be math.hex. (d) will do the same as (c), but will name the output file data.hex. If a .o file is listed in either of these two command lines it will be ignored during assembly, but will be used during linking. (2) Directives The following is a summary of the directives available in this language. (a) FILE INCLUSION -- include "FILE" This command will read the contents of the file named (FILE) into the current location of the current file. By convention, include files should have names ending in ".h" or ".i" and should only consist of declarations. Include files generally serve two purposes: to provide a place to store related constant definitions and declarations, to declare the globally visible objects of an assembly language module. (b) Setting current SEGMENT and LOCATION -- seg, at, org At any point in scanning a *.s assembly language file, the assembler will recognize a current segment and current location. The latter can be referred to by the user as $. To see how these items can be set, look at the following examples: seg code seg xdata at 0x8000 seg xdata org 0x8000 org 50 at 50 The first example sets the current segment to the type "code". The current location is left unspecified. THIS IS HOW RELATIVE ADDRESSING IS INITIATED. The actual address of the segment's start will not be determined when the object file is created, but is deferrred until the object file is linked. Why do things this way? One simple reason: MODULARITY. You can now define your own assembly language module, and convert it into an object file ready to be linked in with the rest of whatever program might be using it. You don't have to worry about the exact address where you memory segments will be located each time you include this module in a new program. This makes it possible to create reuseable libraries of common assembly language functions. The second and third example do exactly the same things because "at" and "org" are synonymous. The latter is included only for compatibility with other assembly language programs and for familiarity's sake, but I strongly recommend you using the former. It simply reads nicer. The effect of this operation is to set the current segment to "xdata" and the current location to 0x8000. The last two examples are equivalent to one another and set the current location to 50 without changing the current segment. At the very start of assembly, the current segment is set to the first segment ("code"), and the address is left indefinite. When different modules are linked together, the linker will attempt to take all the segments of each type and place them in non-overlapping areas of memory, shifting the relative segments around as needed to accomplish this goal. What if you want to control the placement of objects, say to exclude addresses 0 to 4000 hex? An easy way is to simply write up a module to the effect: seg code at 0 ds 4000h assemble it seperately and link it in with any program where you want to reserve this address space. The linker tries to place your segments in exclusive areas in as tight a fit as possible. So this module will result in the address space 0 to 4000 being excluded from the rest of your program. The segments types supported by this 8051 assembler are the following: * code --- the 8051 code address space, ranges from 0 to ffff hex. * xdata -- the external data address space, same range. * data --- the internal data/register space. Ranges from 0 to ff. Only addresses under 80 hex can be used in mnemonics involving direct addressing. Other segment types are internally used by the assembler. They are: * sfr ---- the Special Function Register space -- ranges from 80 to ff. * bit ---- the bit addressible address space. These comprise the individual bits in registers 20(hex) to 2f(hex), and the sfr addresses (hexadecimal) 80, 88, 90, 98, ..., f0, f8. Defining a new segment with one of these types will result in an error. (c) Defining new LABELS -- LABEL equ Exp, LABEL Type Exp, LABEL: LABEL set Exp, LABEL = Exp These operations are defined as follows: LABEL equ Exp defines a constant value LABEL and sets it to the value Exp. LABEL Type Exp defines a constant address "LABEL" of the indicated type and sets it to the address given by "Exp". The types recognized by this assembler are: code, xdata, data, sfr, and bit. LABEL: sets a constant address "LABEL" to the current address in the current segment. LABEL set Exp defines a variable, LABEL, and sets it to the value Exp. LABEL = Exp the same thing as "set". The following assembly language fragment is an illustration of these operations: seg code at 0 Start: ds 0x4000 Size equ $ - Start End code Start + Size The first statement sets the current segment and location to "code" and 0. The next statement is preceded by the label, "Start:". This is equivalent to the statement: Start code $. What it does is define "Start" as a code address, and sets it to the current location (which is 0). Following this is an instruction to reserve 4000(hex) units (bytes) of storage. After this operation, the current location is now 0x4000. The third instruction sets the numerical constant "Size" to 0x4000 - 0, or just 0x4000. The final directive defines a code address with the name "End" and sets it to the address Start + Size (or just 0x4000). Variable differ from constants in that they can be redefined. Constants cannot be redefined. (d) Numeric labels One can also define anonymous numeric labels, as in the following example: 1: cjne A, #0, 1f inc A movx @DPTR, A inc DPTR mov A, @R1 inc R1 jz 2f sjmp 1b 1: setb C ret 2: clr C ret Each occurrence of "1:" stands for a unique anonymous label, likewise for "2:". Any number may be used in this way to denote an anonymous label. When a label is referenced by the number followed by an "f", then the first matching numeric label IN THE CURRENT SEGMENT forward of the current location is being referred to. In the example above, 1f and 2f refer respectively to the occurrences of 1: and 2: toward the end of the example. When a label is referenced by the number followed by a "b", then the first matching numeric label IN THE CURRENT SEGMMENT behind the current location is being referred to. In the example above, 1b refers to the 1: at the top of the example. Thus, this segment is equivalent to the following: X1: cjne A, #0, Y1 inc A movx @DPTR, A inc DPTR mov A, @R1 inc R1 jz Y2 sjmp X1 Y1: setb C ret Y2: clr C ret This feature saves you from the burden of defining needless names for labels that really serve as nothing more than place-holders. (e) Declaring GLOBAL labels -- global, public Any constant directive: LABEL equ Exp LABEL Type Exp LABEL: can be prefixed by "global" or "public" to result in: global LABEL equ Exp global LABEL Type Exp global LABEL: or public LABEL equ Exp public LABEL Type Exp public LABEL: What this does is to make these labels visible to modules other than the one where these labels are defined. By default, all labels are visible only in the file where they are used. (f) Declaring EXTERNAL labels -- extern Type LABEL, ..., LABEL extern equ LABEL, ..., LABEL For each global label defined in a *.s module file, a corresponding external declaration should be made be made in whatever other module this label is to be used. Typically, one will make these and other related declarations in a *.h file and then INCLUDE this file in whatever module needs the declarations. The type must match the type of the label being referenced, if it is an address, or it must be "equ" if the label referenced was a numeric constant. For example if one declared global labels in a module Kernel.s as follows: public STACK_BASE data 0x80 ... seg code public Spawn: .... public Resume: ... one would generally make the corresponding declarations: extern data STACK_BASE extern code Spawn, Resume in a header file (say, Kernel.h), and then include this file in any source module where the addresses STACK_BASE and Spawn might be needed. (g) Memory ALLOCATION -- ds, rb, rw The following operations can be used in any segment. They are generally used to allocate space for objects and so are generally used in conjunction with "LABEL:" type definitions. These are examples: seg code at 0 BASIC_SEG: ds 0x4000 seg xdata Byte: ds 1 ByteArray: rb 5 WordArray: rw 5 The first example reserves 0x4000 units (bytes) in the current segment for the variable BASIC_SEG and then increments the current location by 0x4000. Basically, this operation behaves as if the assignment "$ = $ + 0x4000" had just been carried out. Both "ds" and "rb" are exactly equivalent, but the latter more descriptively states: reserve single-byte units. So the second example reserves 1 byte for the variable "Byte", and 5 bytes for "ByteArray". NO MEMORY IMAGE IS GENERATED FOR ANY SPACE SKIPPED BY ds/rb/rw. The third example is equivalent to: WordArray: rb 10 Each unit following a "rw" is a word, which consists of two bytes. (h) Memory FORMATTING - db, dw These operations can be used in the code segment only. They are the only directives that can generate memory images. The only other operations that generate memory image output are the 8051 mnemonics, which likewise are restricted to the code segment only. Two purpose served by these operations is mainly to initialize data, examples: ByteArray: db 'a', 'b', 'c', 'd', 'e' String: db "This is a string", 0 In the following examples: db 0x20, "String", 'c' dw 0x1234, 0x5678 the first operation lays out the byte 0x20 and equivalent character codes for 'S', 't', 'r', 'i', 'n', 'g', and 'c' in that order. The current location is then increment by 8 to the location following the last item. The second operation is equivalent to the following: db 0x12, 0x34, 0x56, 0x78 It formats 2-byte word units into memory. Both of the operations: db, and dw can be followed by a comma-seperated series of numeric values or addresses. In addition, db can accept strings, as shown in the examplex above. (i) CONDITIONAL assembly -- if (Ex) ST, if (Ex) ST else ST These statements are used to selectively assemble different sets of statements. For example if (STAND_ALONE) { at 0x03 mov R0, #SP_IE0 acall Pause reti } else { at 0x4003 pop PSW mov R0, #SP_IE0 acall Pause reti } will assemble the first set of statements (at 0x03 ... reti) if the label STAND_ALONE is anything other than 0, and the second set (at 0x4003...reti) if the label is 0. An example with the exact same effect could be written as: if (STAND_ALONE) SEG equ 0; else SEG equ 0x4000 at SEG + 3 if (!STAND_ALONE) pop PSW mov R0, #SP_IE0 acall Pause reti Both the if and else part of the conditional will accept only one statement. If more than one statement needs to be included, as in the first example, then they can be grouped within curly braces. (j) Statement GROUPING -- { ... }, multiple statements on a line. Any sequence of statement included within a matching set of curly brackets is treated as a single statement. It can then be used in the body of any conditional just like any single statement can. SPECIAL NOTES ON STATEMENT FORMATTING: (A) ALL STATEMENTS (a) THROUGH (h) MUST END IN SEMICOLONS. However, this semicolon can be elided if it is the last item on a line. This allows compatibility with more traditional one-statement-on-a-line type assemblers. So normally, you don't have to even concern yourself with this if you adhere to one-statement per line style. (B) A BASIC STATEMENT ((a) THROUGH (h)) MUST BE WRITTEN ALL ON ONE LINE It cannot be split up into two or more lines. (C) ALL COMMENTS ARE IN C++ STYLE. Many assemblers use the semicolon to initiate comments. I have decided against this feature in favor of making this assembler more compatible with C++ syntax. Comments occur in the following two forms: (a) Anything included between a matching pair /* ... */ (b) Anything included between a // and end of line. However, for increased compatibility, I also allow the following format: (c) Anything included between a ;; and end of line. My personal style is to precede comments with a ;;;, so none of this impinges on the software included in the archive with the assembler. There is a short C-program included that will blindly convert all single semicolons to double semicolons. Since I've observed that semicolons rarely occur inside string or character constants in actual 8051 programs, this should ALMOST always be sufficient to resolve any incompatibilities with your older assembly language programs. (n) What goes in a *.s file, what goes in a *.h file? Generally speaking, declarations should be placed in a *.h header file. The design of this assembler (especially with it being a one-pass assembler) is intended to support this usage. Any of the following is a declaration: (c) Defining new LABELS -- LABEL equ Exp, LABEL Type Exp (f) Declaring EXTERNAL labels -- extern Type LABEL, ..., LABEL extern equ LABEL, ..., LABEL Declarations only meant to be accessed within one module should be made inside that module, instead of out in a header file. The following should be used only in *.s files, as they are generally (a) used to create memory images, (b) used to define non-global objects, or (c) used to define address values: (a) FILE INCLUSION -- include FILE (b) Setting current SEGMENT and LOCATION -- seg, at, org (c) Defining new LABELS -- LABEL: (d) Numeric labels (e) Declaring GLOBAL labels -- global (g) Memory ALLOCATION -- ds, rb, rw (h) Memory FORMATTING - db, dw The last two items are generally used in many different contexts, and so can be used anywhere: (i) CONDITIONAL assembly -- if (Ex) ST, if (Ex) ST else ST (j) Statement GROUPING -- { ... } (3) Expressions (a) Operators The syntax is the same as in C. The following operations are defined: BIT-WISE: ~, &, ^, |, <<, >> BOOLEAN: !, &&, ||, <, <=, >, >=, ==, != CONDITIONAL: ? : ARITHMETIC: prefix + and -, +, -, *, /, % CONVERSIONS: high, low, by BIT CONVERSION: . The operator precedences are all the same as in C. The latter two groups, not defined in C, are described in more detail below. The operator high, and low have the same precedence as all the other prefix operators (+, -, !, and ~). The operators "by" and "." have the lowest precedence of all infix operators, so for example A * B by C is resolved as: A * (B by C) and A.B + C as: (A.B) + C Parentheses may be used to enclose expressions as in C, for example: A + ((B << 2)&(C >> 3)) (b) CONVERSIONS ... high X, low X, H by L The following examples illustrate these operations: high 1234h (result: 12h .. the upper byte of the word 1234h) low 1223h (result: 34h .. the lower byte of the word 1234h) 12h by 34h (result: 1234h) (c) BIT-CONVERSION ... Dir.Pos This is an 8051-specific operation related to the bit-addressing structure of the processor. The first argument represents a direct data register (of type "data" and value < 80h, or type "sfr" and value >= 80h). The second represents a bit position (0, through 7). The register, Dir, must be bit addressible. These include only: data; 20h - 2fh sfr: 80h, 88h, 90h, 98h, 0a0h, 0a8h, 0b0h, 0b8h, 0c0h, 0c8h, 0d0h, 0d8h, 0e0h, 0e8h, 0f0h, 0f8h The sfr registers and bit positions generally have meanings defined by the manufacturer of the 8051 processor and vary between different versions of the 8051. They are not generally free to be defined by the programmer for arbitrary use. Most of them control or monitor the internal 8051 peripherals. (d) LOCATION COUNTER -- $ A variable address that denotes the current location within the current segment. NOTE: dw $, $ - 2, $ - 4 IS EQUIVALENT TO: dw $; dw $ - 2; dw $ - 4 which is equivalent to: 1: dw 1b; dw 1b; dw 1b The location counter advances in the middle of a dw or db. (e) NUMERIC CONSTANT This assembler accepts both C numeric syntax, as well as the Intel numeric syntax. The relation between the (extended) C notation and Intel notation is illustrated below: HEXADECIMAL: 0xa44f = 0a44fh 0x23 = 23h DECIMAL: 23 = 23 23 = 23d OCTAL: 034 = 34q 056 = 56o BINARY: 0b1001 = 1001b Upper case may be used anywhere lower case is used, so the above can be written as: HEXADECIMAL: 0XA44F = 0A44FH 0X23 = 23H DECIMAL: 23 = 23 23 = 23D OCTAL: 034 = 34Q 056 = 56O BINARY: 0B1001 = 1001B (f) LABELS Labels may consist of any sequence of letters, the _, and digits, not starting in a digit. As with numbers, labels are CASE INSENSITIVE. So all of the following are equivalent: PPC, PPc, Ppc, pPC (4) Referencing Expressions At any time during assembly, a label may be in one of 3 states: (a) DEFINED and ABSOLUTE: This is either a numeric label, or a label denoting an address whose actual value is known. (b) DEFINED and RELATIVE: This is a label denoting an address whose location within its segment is known, bot with the segment being relative. (c) UNDEFINED: This is a label that is either defined elsewhere in another file, or defined later on in the file currently undergoing processing. The following restrictions hold when using expressions: * Only ABSOLUTE labels can be used in any of the directives: at/org, ds/rb, rw if (...) * Only DEFINED labels can be used on the right-hand side of any of the follwing directives: Label equ Exp, LABEL Type Exp LABEL set Exp, LABEL = Exp * Any expression can be used with any image generating statement: Mnemonics db, dw If the expression's value is not known at the time of assembly, then the corresponding location in the image is zeroed out. If the expression's value becomes known by the time the file is processed, the assembler will go back and fill in the zero with the appropriate value(s). (5) Bugs (or "features") (a) There is no way to tell the assembler to locate relatively addressed data registers in the directly addressible space. Consequently you may receive numerous errors during the linking phase telling you that such and such registers cannot be directly addressed. There are basically 2 ways to resolve this: (1) give the registers absolute addresses, (2) try listing the files in which these registers are defined first. The linker maps relative segments from the files in the order you list those files. In the makefile of the sample program provided (in 8051/assem/data), the linking phase is done with the command line: cas -o math.o data.o stdio.o kernel.o This ordering resolves the problem. (b) The assembler won't recognize UNIX-style newlines on a DOS. Therefore, a conversion utility (nl.c) has been provided. (c) No run-time checks are made against the object files processed. A corrupt object file will crash the assembler during the linking phase.